NTpred framework for TYROSINE NITRATION inference On Unseen Data :
Users can feed raw sequences in the text field, or upload a Fasta file containing raw sequences.
- Input sequence in text field should be comma separated raw sequences.
- Input Fasta file should contain: Sequence name, Sequence
- The framework predictions are saved in CSV file that can be downloaded.
- Furthermore, our framework predicts the Tyrosine nitration sites at multiple positions in an input sequence, based on the position of y in the sequence.
- Lets us assume a sample input sequence in the Fasta file:
> sample_1
ITILSYHSSIGVRKDELVHGYILVYSAKRKASMGMLRAFLS
- In this sample, Y occurs at locations 6, 21 and 25.
- Then, the output prediction CSV file will have 3 rows for this sequence:
- Sequence_name, Sequence, Probability, Class
- sample_1__6, ITILSYHSSIGVRKDELVHGYILVYS, 0.1057, 0
- sample_1__21, ITILSYHSSIGVRKDELVHGYILVYSAKRKASMGMLRAFLS, 0.0611, 0
- sample_1__25, SYHSSIGVRKDELVHGYILVYSAKRKASMGMLRAFLS, 0.06139, 0
- The column is same as input fasta file, with "__location" appended that provides the location of the Y residue in the protein sequence.
- The framework predictions are saved in CSV file that can be downloaded.
- The CSV file contains four columns: Sequence_name, Sequence, Probability, Class
- Sequence_name column denotes the name of sequence
- Sequence column represents the sequence used
- "Probability" column provides probability of presence of Tyrosine Nitration site.
- Class column translates the probability into class label, where 1 denotes positive Tyrosine Nitration site, and 0 denotes negative.
TRAINING THE HYBRID ENSEMBLE ARCHITECTURE FROM SCRATCH:
- The NTpred framework can be utilized to perform experimentation in k-fold Cross Validation and Independent Test settings. To perform experimentation in both settings, training data should be provided in a standard Fasta format.
- The Fasta record header should follow:
- Sequence_Name|Class>|Label. For example: sample_1|0|training
- Sequence_Name should be unique
- Class should contain either 1 or 0, denoting the sequence as positive or negative site.
- Label is a random placeholder value
- Users can choose "Kfold" or "Standard" training mode.
- "Kfold" training mode performs a K-fold evaluation of NTpred framework on the provided training data.
- "Standard" training mode can be used to perform Independent test setting. A trained model is deployed using the user provided training data. The trained model can be used for Prediction by the user.